Step-by-step instructions to control the exact construction of output
Hands on and more work
base, grid, tile
Declarative programming
Allow software to apply a standard solution
Customize with a stylesheet
ggplot2
What is ggplot2
Graphics link data to dimensions of specific aesthetic objects, which are distinguishable by their geometric structure and modifiable in scale and style
Users supply the data and request a geometry, leaving others to the software
Layered grammar of graphics
Layers: components of a graph connected by +
Aesthetics: specified inside layers about how layers appear
Elements of information visualization
Objects
Known as \(geoms\) in ggplot2 which specifies how the data are presented on the plot
Points or texts can represent location: geom_point(), geom_text(), geom_label()
Lines can represent numerical values and relationships: geom_line(), geom_smooth()
Polygons can represent area or size: geom_rect(), geom_bar()
Elements of information visualization
Aesthetics
Colors: use default colors or brewer palettes ColorBrewer, R palettes and specify fill or color argument
Line types: specify the linetype argument by an integer or a character (see reference here)
Dot types: specify the shape argument by an integer or a character (see reference here)
Stacked bar plots or histograms: specify the fill argument
Elements of information visualization
Components
Title
Legend
Annotations
Labels
Background
Steps to create a ggplot2 graphic
Establish a mapping between data variables and plotting dimensions or elements
Apply the mapping to one or more standardized aesthetic elements
Draw the resulting set of graphical objects
An illustration
How did GDP per capita change over time in Oceanian countries?
options(scipen =999) # prevent scientific notation like elibrary(gapminder) # load gapminder dataset from gapminder packagelibrary(ggplot2) # load ggplot2 package for visualizationlibrary(tidyverse) # load tidyverse package for data wranglingstr(gapminder) # examine gapminder dataset
ggplot(oceania, # input the dataaes(x = year, y = gdpPercap, color = country,linetype = country)) +# establish aesthetic mappingsgeom_line(size =1) +# apply mapping to geom objectsggtitle("Life expectancy in Oceanian countries over time") +# add titlelabs(x ="Year", y ="GDP per capita") +# add labelstheme_bw() # change background to white background with grey gridlines
ggplot(oceania, # input the dataaes(x = year, y = gdpPercap, color = country,linetype = country)) +# establish aesthetic mappingsgeom_line(size =1) +# apply mapping to geom objectsggtitle("Life expectancy in Oceanian countries over time") +# add titlelabs(x ="Year", y ="GDP per capita") +# add labelstheme_bw() # change background to white background with grey gridlines
# show frequencies of a variablegapminder %>%filter(year ==1952) %>%ggplot(aes(x = lifeExp)) +geom_histogram(binwidth =2) +theme_light() +labs(x ="Life Expectancy", y ="Count", title ="Life Expectancy in 1952")
# show frequencies of a variablegapminder %>%filter(year ==1952) %>%ggplot(aes(x = lifeExp)) +geom_density(size =1.5, alpha =0.2, fill ="red") +theme_light() +labs(x ="Life Expectancy", y ="Count", title ="Life Expectancy in 1952")
# show distribution of a variable (median, 1st, 3rd quantiles, outliers)gapminder %>%filter(year ==1952, continent=="Europe") %>%ggplot(aes(y = lifeExp)) +geom_boxplot(fill ="grey", color ="blue", outlier.shape =1) +# adjust aestheticstheme_light() +labs(title ="Life Expectancy in 1952 (Europe)", y ="Life Expectancy", x ="")
# show distribution of a discrete variable ggplot(gapminder, aes(x = continent,fill = continent)) +# differentiate the filled colorsgeom_bar() +theme_classic() +labs(y ="Number of countries", x ="Continent")
americas <- gapminder %>%filter(year ==2007& continent =="Americas") %>%arrange(gdpPercap) %>%mutate(country =factor(country, levels = country))ggplot(americas, aes(x = gdpPercap, y = country)) +geom_segment(aes(x =0, xend = gdpPercap, y = country, yend = country), # which is why we need to make country a factorcolor ="black") +geom_point(colour ="blue", size =2, alpha =0.8) +scale_x_continuous(expand =c(0, 0), limits =c(0, max(americas$gdpPercap) *1.1),labels = scales::dollar) +theme_bw()
gapminder %>%filter(year >1990) %>%group_by(year, continent) %>%summarise(totalpop =sum(as.double(pop))) %>%ggplot(aes(x = year, y = totalpop, fill = continent)) +geom_col(position ="dodge", size =0.2, alpha =0.8) +# dodge overlapping objects side by side scale_x_continuous(breaks =seq(1992, 2007, 5), expand =c(0, 0)) +scale_y_continuous(labels = scales::comma, expand =c(0, 0)) +scale_fill_brewer(palette ="Set1") +theme_bw()
Practices
Exercise 1: Make a scatter plot with average GDP per capita across all countries on the y-axis and year on the x-axis.
Exercise 2: Break down the plot from exercise 1 by continent, using colors to distinguish the points and transforming mean GDP per capita on the log scale.
Exercise 3: Make a collection of bar plots faceted by year that compare mean GDP per capita across countries in a given year. Orient the plots to make it easier to read the continent labels.
Exercise 4: What is the relationship between life expectancy and GDP per capita in 2007 by non-Oceanian continents?
gapminder %>%group_by(year, continent) %>%# aggregate the information by year by continentsummarize(meanGDPpc =mean(gdpPercap)) %>%ggplot(aes(x = year, y = meanGDPpc, color = continent)) +geom_point() +scale_y_log10() # apply the log scale to GDP per capita
gapminder %>%group_by(year, continent) %>%summarize(meanGDPpc =mean(gdpPercap)) %>%ggplot(aes(x = continent, y = meanGDPpc)) +geom_col() +facet_wrap(~ year) +coord_flip() # flip the coordinates so that the continent names are visible